Bidirectional LSTM Networks Employing Stacked Bottleneck Features for Expressive Speech-Driven Head Motion Synthesis
نویسندگان
چکیده
Previous work in speech-driven head motion synthesis is centred around Hidden Markov Model (HMM) based methods and data that does not show a large variability of expressiveness in both speech and motion. When using expressive data, these systems often fail to produce satisfactory results. Recent studies have shown that using deep neural networks (DNNs) results in a better synthesis of head motion, in particular when employing bidirectional long short-term memory (BLSTM). We present a novel approach which makes use of DNNs with stacked bottleneck features combined with a BLSTM architecture to model context and expressive variability. Our proposed DNN architecture outperforms conventional feed-forward DNNs and simple BLSTM networks in an objective evaluation. Results from a subjective evaluation show a significant improvement of the bottleneck architecture over feed-forward DNNs.
منابع مشابه
BLSTM neural networks for speech driven head motion synthesis
Head motion naturally occurs in synchrony with speech and carries important intention, attitude and emotion factors. This paper aims to synthesize head motions from natural speech for talking avatar applications. Specifically, we study the feasibility of learning speech-to-head-motion regression models by two types of popular neural networks, i.e., feed-forward and bidirectional long short-term...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملDeep Stacked Bidirectional and Unidirectional LSTM Recurrent Neural Network for Network-wide Traffic Speed Prediction
Short-term traffic forecasting based on deep learning methods, especially long-term short memory (LSTM) neural networks, received much attention in recent years. However, the potential of deep learning methods is far from being fully exploited in terms of the depth of the architecture, the spatial scale of the prediction area, and the prediction power of spatial-temporal data. In this paper, a ...
متن کاملPredicting Head Pose from Speech with a Conditional Variational Autoencoder
Natural movement plays a significant role in realistic speech animation. Numerous studies have demonstrated the contribution visual cues make to the degree we, as human observers, find an animation acceptable. Rigid head motion is one visual mode that universally cooccurs with speech, and so it is a reasonable strategy to seek a transformation from the speech mode to predict the head pose. Seve...
متن کاملSpeech-driven head motion synthesis using neural networks
This paper presents a neural network approach for speech-driven head motion synthesis, which can automatically predict a speaker’s head movement from his/her speech. Specifically, we realize speech-to-head-motion mapping by learning a multi-layer perceptron from audio-visual broadcast news data. First, we show that a generatively pre-trained neural network significantly outperforms a randomly i...
متن کامل